Building RNN, LSTM, and GRU for time series using PyTorch¶
In this notebook, I’d like to give you a bit of an introduction to some of the RNN structures, such as RNN, LSTM, and GRU, and help you get started building your deep learning models for time-series forecasting using PyTorch.
Due to PyTorch’s recency, it has been somewhat difficult for me to find the relevant pieces of information and code samples from the get-go, which is usually a bit easier with frameworks that have been around for a while, say TensorFlow. So, I decided to put together the things I would have liked to know earlier.
Installing libraries¶
import torch
import torch.nn as nn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"{device}" " is available.")
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15344/3183702149.py in <module>
----> 1 import torch
2 import torch.nn as nn
3 import pandas as pd
4 import numpy as np
5 import matplotlib.pyplot as plt
ModuleNotFoundError: No module named 'torch'
Loading the dataset¶
Well, I suppose we need some time-series data to start with. Be it payment transactions or stock exchange data, time-series data is everywhere. One such public dataset is PJM’s Hourly Energy Consumption data, a univariate time-series dataset of 10+ years of hourly observations collected from different US regions. I’ll be using the PJM East region data, which originally has the hourly energy consumption data from 2001 to 2018, but any of the datasets provided in the link should work.
The following cell pops up an upload widget, so that you can upload your data to this notebook.
from google.colab import files
# data_to_load = files.upload()
If you’d like to work with a dataset other than PJME_hourly.csv, you can change the field in the next cell accordingly.
import io
df = pd.read_csv(io.BytesIO(data_to_load['PJME_hourly.csv']))
import plotly.graph_objs as go
from plotly.offline import iplot
def plot_dataset(df, title):
data = []
value = go.Scatter(
x=df.index,
y=df.value,
mode="lines",
name="values",
marker=dict(),
text=df.index,
line=dict(color="rgba(0,0,0, 0.3)"),
)
data.append(value)
layout = dict(
title=title,
xaxis=dict(title="Date", ticklen=5, zeroline=False),
yaxis=dict(title="Value", ticklen=5, zeroline=False),
)
fig = dict(data=data, layout=layout)
iplot(fig)
df = df.set_index(['Datetime'])
df = df.rename(columns={'PJME_MW': 'value'})
df.index = pd.to_datetime(df.index)
if not df.index.is_monotonic:
df = df.sort_index()
plot_dataset(df, title='PJM East (PJME) Region: estimated energy consumption in Megawatts (MW)')
The next step is to generate feature columns to transform our univariate dataset into a multivariate dataset. We will convert this time series into a supervised learning problem if you will. In some datasets, such features as hourly temperature, humidity, or precipitation, are readily available. However, in our dataset, no extra information could help us predict the energy consumption is given. So, it falls to our lot to create such predictors, i.e., feature columns.
I’ll show you two popular ways to generate features: passing lagged observations as features and creating date time features from the DateTime index. Both approaches have their advantages and disadvantages, and each may prove more useful depending on the task at hand.
Generating time-lagged observations¶
Let’s start with using time steps as features. In other words, we’re trying to predict the next value, X(t+n), from the previous n observations Xt, X+1, …, and X(t+n-1). Then, what we need to do is simply creating n columns with the preceding observations. Luckily, Pandas provides the method shift() to shift the values in a column. So, we can write a for loop to create such lagged observations by shifting the values in a column by n times and removing the first n columns.
After setting the number of input features, i.e., lagged observations, to 100, we get the following DataFrame with 101 columns, one for the actual value, and the rest for the preceding 100 observations at each row.
def generate_time_lags(df, n_lags):
df_n = df.copy()
for n in range(1, n_lags + 1):
df_n[f"lag{n}"] = df_n["value"].shift(n)
df_n = df_n.iloc[n_lags:]
return df_n
input_dim = 100
df_timelags = generate_time_lags(df, input_dim)
df_timelags
| value | lag1 | lag2 | lag3 | lag4 | lag5 | lag6 | lag7 | lag8 | lag9 | lag10 | lag11 | lag12 | lag13 | lag14 | lag15 | lag16 | lag17 | lag18 | lag19 | lag20 | lag21 | lag22 | lag23 | lag24 | lag25 | lag26 | lag27 | lag28 | lag29 | lag30 | lag31 | lag32 | lag33 | lag34 | lag35 | lag36 | lag37 | lag38 | lag39 | ... | lag61 | lag62 | lag63 | lag64 | lag65 | lag66 | lag67 | lag68 | lag69 | lag70 | lag71 | lag72 | lag73 | lag74 | lag75 | lag76 | lag77 | lag78 | lag79 | lag80 | lag81 | lag82 | lag83 | lag84 | lag85 | lag86 | lag87 | lag88 | lag89 | lag90 | lag91 | lag92 | lag93 | lag94 | lag95 | lag96 | lag97 | lag98 | lag99 | lag100 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Datetime | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2002-01-05 05:00:00 | 26822.0 | 26669.0 | 27034.0 | 27501.0 | 28635.0 | 30924.0 | 33202.0 | 35368.0 | 36762.0 | 37539.0 | 38263.0 | 37710.0 | 34563.0 | 33352.0 | 33577.0 | 34237.0 | 34639.0 | 35333.0 | 35809.0 | 36318.0 | 36721.0 | 36958.0 | 34579.0 | 30540.0 | 28637.0 | 28122.0 | 28145.0 | 28421.0 | 29338.0 | 31038.0 | 33869.0 | 36636.0 | 38328.0 | 39177.0 | 39662.0 | 39261.0 | 35807.0 | 34497.0 | 34478.0 | 35186.0 | ... | 34169.0 | 34304.0 | 34978.0 | 35548.0 | 36284.0 | 36758.0 | 37035.0 | 37322.0 | 37313.0 | 34725.0 | 30748.0 | 28405.0 | 27533.0 | 27301.0 | 27437.0 | 28121.0 | 29563.0 | 31857.0 | 34007.0 | 35285.0 | 35639.0 | 35732.0 | 35103.0 | 31271.0 | 29720.0 | 29798.0 | 30360.0 | 31031.0 | 31496.0 | 31395.0 | 30692.0 | 29943.0 | 29595.0 | 29308.0 | 28654.0 | 28057.0 | 27899.0 | 28357.0 | 29265.0 | 30393.0 |
| 2002-01-05 06:00:00 | 27399.0 | 26822.0 | 26669.0 | 27034.0 | 27501.0 | 28635.0 | 30924.0 | 33202.0 | 35368.0 | 36762.0 | 37539.0 | 38263.0 | 37710.0 | 34563.0 | 33352.0 | 33577.0 | 34237.0 | 34639.0 | 35333.0 | 35809.0 | 36318.0 | 36721.0 | 36958.0 | 34579.0 | 30540.0 | 28637.0 | 28122.0 | 28145.0 | 28421.0 | 29338.0 | 31038.0 | 33869.0 | 36636.0 | 38328.0 | 39177.0 | 39662.0 | 39261.0 | 35807.0 | 34497.0 | 34478.0 | ... | 35674.0 | 34169.0 | 34304.0 | 34978.0 | 35548.0 | 36284.0 | 36758.0 | 37035.0 | 37322.0 | 37313.0 | 34725.0 | 30748.0 | 28405.0 | 27533.0 | 27301.0 | 27437.0 | 28121.0 | 29563.0 | 31857.0 | 34007.0 | 35285.0 | 35639.0 | 35732.0 | 35103.0 | 31271.0 | 29720.0 | 29798.0 | 30360.0 | 31031.0 | 31496.0 | 31395.0 | 30692.0 | 29943.0 | 29595.0 | 29308.0 | 28654.0 | 28057.0 | 27899.0 | 28357.0 | 29265.0 |
| 2002-01-05 07:00:00 | 28557.0 | 27399.0 | 26822.0 | 26669.0 | 27034.0 | 27501.0 | 28635.0 | 30924.0 | 33202.0 | 35368.0 | 36762.0 | 37539.0 | 38263.0 | 37710.0 | 34563.0 | 33352.0 | 33577.0 | 34237.0 | 34639.0 | 35333.0 | 35809.0 | 36318.0 | 36721.0 | 36958.0 | 34579.0 | 30540.0 | 28637.0 | 28122.0 | 28145.0 | 28421.0 | 29338.0 | 31038.0 | 33869.0 | 36636.0 | 38328.0 | 39177.0 | 39662.0 | 39261.0 | 35807.0 | 34497.0 | ... | 39532.0 | 35674.0 | 34169.0 | 34304.0 | 34978.0 | 35548.0 | 36284.0 | 36758.0 | 37035.0 | 37322.0 | 37313.0 | 34725.0 | 30748.0 | 28405.0 | 27533.0 | 27301.0 | 27437.0 | 28121.0 | 29563.0 | 31857.0 | 34007.0 | 35285.0 | 35639.0 | 35732.0 | 35103.0 | 31271.0 | 29720.0 | 29798.0 | 30360.0 | 31031.0 | 31496.0 | 31395.0 | 30692.0 | 29943.0 | 29595.0 | 29308.0 | 28654.0 | 28057.0 | 27899.0 | 28357.0 |
| 2002-01-05 08:00:00 | 29709.0 | 28557.0 | 27399.0 | 26822.0 | 26669.0 | 27034.0 | 27501.0 | 28635.0 | 30924.0 | 33202.0 | 35368.0 | 36762.0 | 37539.0 | 38263.0 | 37710.0 | 34563.0 | 33352.0 | 33577.0 | 34237.0 | 34639.0 | 35333.0 | 35809.0 | 36318.0 | 36721.0 | 36958.0 | 34579.0 | 30540.0 | 28637.0 | 28122.0 | 28145.0 | 28421.0 | 29338.0 | 31038.0 | 33869.0 | 36636.0 | 38328.0 | 39177.0 | 39662.0 | 39261.0 | 35807.0 | ... | 40002.0 | 39532.0 | 35674.0 | 34169.0 | 34304.0 | 34978.0 | 35548.0 | 36284.0 | 36758.0 | 37035.0 | 37322.0 | 37313.0 | 34725.0 | 30748.0 | 28405.0 | 27533.0 | 27301.0 | 27437.0 | 28121.0 | 29563.0 | 31857.0 | 34007.0 | 35285.0 | 35639.0 | 35732.0 | 35103.0 | 31271.0 | 29720.0 | 29798.0 | 30360.0 | 31031.0 | 31496.0 | 31395.0 | 30692.0 | 29943.0 | 29595.0 | 29308.0 | 28654.0 | 28057.0 | 27899.0 |
| 2002-01-05 09:00:00 | 31241.0 | 29709.0 | 28557.0 | 27399.0 | 26822.0 | 26669.0 | 27034.0 | 27501.0 | 28635.0 | 30924.0 | 33202.0 | 35368.0 | 36762.0 | 37539.0 | 38263.0 | 37710.0 | 34563.0 | 33352.0 | 33577.0 | 34237.0 | 34639.0 | 35333.0 | 35809.0 | 36318.0 | 36721.0 | 36958.0 | 34579.0 | 30540.0 | 28637.0 | 28122.0 | 28145.0 | 28421.0 | 29338.0 | 31038.0 | 33869.0 | 36636.0 | 38328.0 | 39177.0 | 39662.0 | 39261.0 | ... | 39484.0 | 40002.0 | 39532.0 | 35674.0 | 34169.0 | 34304.0 | 34978.0 | 35548.0 | 36284.0 | 36758.0 | 37035.0 | 37322.0 | 37313.0 | 34725.0 | 30748.0 | 28405.0 | 27533.0 | 27301.0 | 27437.0 | 28121.0 | 29563.0 | 31857.0 | 34007.0 | 35285.0 | 35639.0 | 35732.0 | 35103.0 | 31271.0 | 29720.0 | 29798.0 | 30360.0 | 31031.0 | 31496.0 | 31395.0 | 30692.0 | 29943.0 | 29595.0 | 29308.0 | 28654.0 | 28057.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2018-08-02 20:00:00 | 44057.0 | 45641.0 | 46760.0 | 46816.0 | 46989.0 | 47154.0 | 46534.0 | 45372.0 | 43954.0 | 42189.0 | 39902.0 | 37810.0 | 35645.0 | 33182.0 | 31197.0 | 29854.0 | 29791.0 | 30543.0 | 32094.0 | 34283.0 | 37158.0 | 40666.0 | 44094.0 | 45985.0 | 46912.0 | 48428.0 | 49308.0 | 48855.0 | 47867.0 | 46430.0 | 45313.0 | 43887.0 | 42403.0 | 40316.0 | 38572.0 | 36765.0 | 34741.0 | 32365.0 | 30035.0 | 28481.0 | ... | 28913.0 | 26863.0 | 25408.0 | 25050.0 | 25356.0 | 26192.0 | 27576.0 | 29701.0 | 32303.0 | 34879.0 | 35899.0 | 36166.0 | 36901.0 | 37564.0 | 37600.0 | 37615.0 | 37624.0 | 37671.0 | 37228.0 | 36747.0 | 36060.0 | 34779.0 | 33081.0 | 31050.0 | 28676.0 | 26824.0 | 25479.0 | 25200.0 | 25675.0 | 26779.0 | 28389.0 | 30789.0 | 33747.0 | 36581.0 | 37870.0 | 39089.0 | 40517.0 | 40709.0 | 39906.0 | 38637.0 |
| 2018-08-02 21:00:00 | 43256.0 | 44057.0 | 45641.0 | 46760.0 | 46816.0 | 46989.0 | 47154.0 | 46534.0 | 45372.0 | 43954.0 | 42189.0 | 39902.0 | 37810.0 | 35645.0 | 33182.0 | 31197.0 | 29854.0 | 29791.0 | 30543.0 | 32094.0 | 34283.0 | 37158.0 | 40666.0 | 44094.0 | 45985.0 | 46912.0 | 48428.0 | 49308.0 | 48855.0 | 47867.0 | 46430.0 | 45313.0 | 43887.0 | 42403.0 | 40316.0 | 38572.0 | 36765.0 | 34741.0 | 32365.0 | 30035.0 | ... | 31217.0 | 28913.0 | 26863.0 | 25408.0 | 25050.0 | 25356.0 | 26192.0 | 27576.0 | 29701.0 | 32303.0 | 34879.0 | 35899.0 | 36166.0 | 36901.0 | 37564.0 | 37600.0 | 37615.0 | 37624.0 | 37671.0 | 37228.0 | 36747.0 | 36060.0 | 34779.0 | 33081.0 | 31050.0 | 28676.0 | 26824.0 | 25479.0 | 25200.0 | 25675.0 | 26779.0 | 28389.0 | 30789.0 | 33747.0 | 36581.0 | 37870.0 | 39089.0 | 40517.0 | 40709.0 | 39906.0 |
| 2018-08-02 22:00:00 | 41552.0 | 43256.0 | 44057.0 | 45641.0 | 46760.0 | 46816.0 | 46989.0 | 47154.0 | 46534.0 | 45372.0 | 43954.0 | 42189.0 | 39902.0 | 37810.0 | 35645.0 | 33182.0 | 31197.0 | 29854.0 | 29791.0 | 30543.0 | 32094.0 | 34283.0 | 37158.0 | 40666.0 | 44094.0 | 45985.0 | 46912.0 | 48428.0 | 49308.0 | 48855.0 | 47867.0 | 46430.0 | 45313.0 | 43887.0 | 42403.0 | 40316.0 | 38572.0 | 36765.0 | 34741.0 | 32365.0 | ... | 33056.0 | 31217.0 | 28913.0 | 26863.0 | 25408.0 | 25050.0 | 25356.0 | 26192.0 | 27576.0 | 29701.0 | 32303.0 | 34879.0 | 35899.0 | 36166.0 | 36901.0 | 37564.0 | 37600.0 | 37615.0 | 37624.0 | 37671.0 | 37228.0 | 36747.0 | 36060.0 | 34779.0 | 33081.0 | 31050.0 | 28676.0 | 26824.0 | 25479.0 | 25200.0 | 25675.0 | 26779.0 | 28389.0 | 30789.0 | 33747.0 | 36581.0 | 37870.0 | 39089.0 | 40517.0 | 40709.0 |
| 2018-08-02 23:00:00 | 38500.0 | 41552.0 | 43256.0 | 44057.0 | 45641.0 | 46760.0 | 46816.0 | 46989.0 | 47154.0 | 46534.0 | 45372.0 | 43954.0 | 42189.0 | 39902.0 | 37810.0 | 35645.0 | 33182.0 | 31197.0 | 29854.0 | 29791.0 | 30543.0 | 32094.0 | 34283.0 | 37158.0 | 40666.0 | 44094.0 | 45985.0 | 46912.0 | 48428.0 | 49308.0 | 48855.0 | 47867.0 | 46430.0 | 45313.0 | 43887.0 | 42403.0 | 40316.0 | 38572.0 | 36765.0 | 34741.0 | ... | 34799.0 | 33056.0 | 31217.0 | 28913.0 | 26863.0 | 25408.0 | 25050.0 | 25356.0 | 26192.0 | 27576.0 | 29701.0 | 32303.0 | 34879.0 | 35899.0 | 36166.0 | 36901.0 | 37564.0 | 37600.0 | 37615.0 | 37624.0 | 37671.0 | 37228.0 | 36747.0 | 36060.0 | 34779.0 | 33081.0 | 31050.0 | 28676.0 | 26824.0 | 25479.0 | 25200.0 | 25675.0 | 26779.0 | 28389.0 | 30789.0 | 33747.0 | 36581.0 | 37870.0 | 39089.0 | 40517.0 |
| 2018-08-03 00:00:00 | 35486.0 | 38500.0 | 41552.0 | 43256.0 | 44057.0 | 45641.0 | 46760.0 | 46816.0 | 46989.0 | 47154.0 | 46534.0 | 45372.0 | 43954.0 | 42189.0 | 39902.0 | 37810.0 | 35645.0 | 33182.0 | 31197.0 | 29854.0 | 29791.0 | 30543.0 | 32094.0 | 34283.0 | 37158.0 | 40666.0 | 44094.0 | 45985.0 | 46912.0 | 48428.0 | 49308.0 | 48855.0 | 47867.0 | 46430.0 | 45313.0 | 43887.0 | 42403.0 | 40316.0 | 38572.0 | 36765.0 | ... | 36507.0 | 34799.0 | 33056.0 | 31217.0 | 28913.0 | 26863.0 | 25408.0 | 25050.0 | 25356.0 | 26192.0 | 27576.0 | 29701.0 | 32303.0 | 34879.0 | 35899.0 | 36166.0 | 36901.0 | 37564.0 | 37600.0 | 37615.0 | 37624.0 | 37671.0 | 37228.0 | 36747.0 | 36060.0 | 34779.0 | 33081.0 | 31050.0 | 28676.0 | 26824.0 | 25479.0 | 25200.0 | 25675.0 | 26779.0 | 28389.0 | 30789.0 | 33747.0 | 36581.0 | 37870.0 | 39089.0 |
145266 rows × 101 columns
Generating date/time predictors¶
Despite its name, feature engineering is generally more art than science. Nonetheless, there are some rules of thumb that can guide data scientists and the like. My goal in this section is not to go through all such practices here, but just to demonstrate a couple of them for you to experiment on your own. In effect, feature engineering is very much dependent on the domain that you’re working in, possibly requiring the creation of a different set of features for the task at hand.
Having a univariate time-series dataset at hand, it seems only logical to start by generating date and time features. As we have already converted the index of the dataset into Pandas’ DatetimeIndex type, a series of DateTime objects, we can easily create new features from the index values, like hour, day, week, month and, day of the week as follows.
df_features = (
df
.assign(hour = df.index.hour)
.assign(day = df.index.day)
.assign(month = df.index.month)
.assign(day_of_week = df.index.dayofweek)
.assign(week_of_year = df.index.week)
)
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:7: FutureWarning:
weekofyear and week have been deprecated, please use DatetimeIndex.isocalendar().week instead, which returns a Series. To exactly reproduce the behavior of week and weekofyear and return an Index, you may call pd.Int64Index(idx.isocalendar().week)
Although passing date and time features to the model without any touch may work in practice, it would certainly make it harder for the model to learn interdependencies between these features. For us humans, it is rather straightforward to see the hours, days, weeks and months follow somewhat cyclical patterns. While it is trivial for us to say that December is followed by January, it may not be very clear for algorithms to discern the first month of the year comes after the 12th one. One can easily come up with many more examples for that matter. So, this makes good feature engineering crucial for building deep learning models, even more so for traditional machine learning models.
One-hot encoding¶
One way to encode DateTime features is to treat them as categorical variables and add a new binary variable for each unique value, widely known as one-hot encoding. Suppose you applied one-hot encoding on your month column, which ranges from 1 to 12. In this case, 12 new month columns are created, say [Jan, Feb, … Dec] and only one of such columns has the value 1 while the rest being zeroed. For instance, some DateTime value from February should have the second of these encoded columns as 1, as in [0, 1, … 0]. Using Pandas’ get_dummies method we can easily create one-hot encoded columns from a given dataset.
def onehot_encode_pd(df, cols):
for col in cols:
dummies = pd.get_dummies(df[col], prefix=col)
return pd.concat([df, dummies], axis=1).drop(columns=cols)
df_features = onehot_encode_pd(df_features, ['month','day','day_of_week','week_of_year'])
df_features.columns
Index(['value', 'hour', 'week_of_year_1', 'week_of_year_2', 'week_of_year_3',
'week_of_year_4', 'week_of_year_5', 'week_of_year_6', 'week_of_year_7',
'week_of_year_8', 'week_of_year_9', 'week_of_year_10',
'week_of_year_11', 'week_of_year_12', 'week_of_year_13',
'week_of_year_14', 'week_of_year_15', 'week_of_year_16',
'week_of_year_17', 'week_of_year_18', 'week_of_year_19',
'week_of_year_20', 'week_of_year_21', 'week_of_year_22',
'week_of_year_23', 'week_of_year_24', 'week_of_year_25',
'week_of_year_26', 'week_of_year_27', 'week_of_year_28',
'week_of_year_29', 'week_of_year_30', 'week_of_year_31',
'week_of_year_32', 'week_of_year_33', 'week_of_year_34',
'week_of_year_35', 'week_of_year_36', 'week_of_year_37',
'week_of_year_38', 'week_of_year_39', 'week_of_year_40',
'week_of_year_41', 'week_of_year_42', 'week_of_year_43',
'week_of_year_44', 'week_of_year_45', 'week_of_year_46',
'week_of_year_47', 'week_of_year_48', 'week_of_year_49',
'week_of_year_50', 'week_of_year_51', 'week_of_year_52',
'week_of_year_53'],
dtype='object')
Though quite useful to encode categorical features, one-hot encoding does not fully capture the cyclical patterns in DateTime features. It simply creates categorical buckets, if you will, and lets the model learn from these seemingly independent features. Encoding the day of the week in a similar manner, for instance, loses the information that Monday is closer to Tuesday than Wednesday.
For some use cases, this may not matter too much, indeed. In fact, with enough data, training time, and model complexity, the model may learn such relationships between such features independently. But there is also another way.
Generating cyclical features (sin/cos transformation)¶
As with all the data we have worked on until now, some data is inherently cyclical. Be it hours, days, weeks, or months, they all follow periodic cycles. Again, this is trivial for us to see, but not so much for machine learning models. The problem simply becomes how can we tell algorithms that the hours 23 and 0 are as close as hour 1 is to hour 2?
The gist is to create two new cyclical features, calculating sine and cosine transform of the given DateTime feature, say the hour of the day. Instead of using the original value for the hour, the model then uses the sine transform of the hour, preserving the cyclicality of the model. To see how and why it works, feel free to have a look at Pierre-Louis’ or David’s blog post on the matter, which explains the concept more in detail.
def generate_cyclical_features(df, col_name, period, start_num=0):
kwargs = {
f'sin_{col_name}' : lambda x: np.sin(2*np.pi*(df[col_name]-start_num)/period),
f'cos_{col_name}' : lambda x: np.cos(2*np.pi*(df[col_name]-start_num)/period)
}
return df.assign(**kwargs).drop(columns=[col_name])
df_features = generate_cyclical_features(df_features, 'hour', 24, 0)
# df_features = generate_cyclical_features(df_features, 'day_of_week', 7, 0)
# df_features = generate_cyclical_features(df_features, 'month', 12, 1)
# df_features = generate_cyclical_features(df_features, 'week_of_year', 52, 0)
df_features.head()
| value | week_of_year_1 | week_of_year_2 | week_of_year_3 | week_of_year_4 | week_of_year_5 | week_of_year_6 | week_of_year_7 | week_of_year_8 | week_of_year_9 | week_of_year_10 | week_of_year_11 | week_of_year_12 | week_of_year_13 | week_of_year_14 | week_of_year_15 | week_of_year_16 | week_of_year_17 | week_of_year_18 | week_of_year_19 | week_of_year_20 | week_of_year_21 | week_of_year_22 | week_of_year_23 | week_of_year_24 | week_of_year_25 | week_of_year_26 | week_of_year_27 | week_of_year_28 | week_of_year_29 | week_of_year_30 | week_of_year_31 | week_of_year_32 | week_of_year_33 | week_of_year_34 | week_of_year_35 | week_of_year_36 | week_of_year_37 | week_of_year_38 | week_of_year_39 | week_of_year_40 | week_of_year_41 | week_of_year_42 | week_of_year_43 | week_of_year_44 | week_of_year_45 | week_of_year_46 | week_of_year_47 | week_of_year_48 | week_of_year_49 | week_of_year_50 | week_of_year_51 | week_of_year_52 | week_of_year_53 | sin_hour | cos_hour | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Datetime | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2002-01-01 01:00:00 | 30393.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.258819 | 0.965926 |
| 2002-01-01 02:00:00 | 29265.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.500000 | 0.866025 |
| 2002-01-01 03:00:00 | 28357.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.707107 | 0.707107 |
| 2002-01-01 04:00:00 | 27899.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.866025 | 0.500000 |
| 2002-01-01 05:00:00 | 28057.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.965926 | 0.258819 |
Splitting the data into test, validation, and train sets¶
After creating feature columns, be it time-lagged observations or date/time features, we split the dataset into three different datasets: training, validation, and test sets.
from sklearn.model_selection import train_test_split
def feature_label_split(df, target_col):
y = df[[target_col]]
X = df.drop(columns=[target_col])
return X, y
def train_val_test_split(df, target_col, test_ratio):
val_ratio = test_ratio / (1 - test_ratio)
X, y = feature_label_split(df, target_col)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_ratio, shuffle=False)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=val_ratio, shuffle=False)
return X_train, X_val, X_test, y_train, y_val, y_test
X_train, X_val, X_test, y_train, y_val, y_test = train_val_test_split(df_features, 'value', 0.2)
Applying scale transformation¶
Scaling the values in your dataset is a highly recommended practice for neural networks, as it is for other machine learning techniques. It speeds up the learning by making it easier for the model to update the weights. You can easily do that by using Scikit-learn’s scalers, MinMaxScaler, RobustScaler, StandardScaler, and the like. For more information on the effects of each scaler, please refer to the official documentation.
And, here is a cool trick if you’re looking for a way to switch between scalers quickly. Get yourself comfortable with the switcher function; we may use it again later on.
from sklearn.preprocessing import MinMaxScaler, StandardScaler, MaxAbsScaler, RobustScaler
def get_scaler(scaler):
scalers = {
"minmax": MinMaxScaler,
"standard": StandardScaler,
"maxabs": MaxAbsScaler,
"robust": RobustScaler,
}
return scalers.get(scaler.lower())()
scaler = get_scaler('minmax')
X_train_arr = scaler.fit_transform(X_train)
X_val_arr = scaler.transform(X_val)
X_test_arr = scaler.transform(X_test)
y_train_arr = scaler.fit_transform(y_train)
y_val_arr = scaler.transform(y_val)
y_test_arr = scaler.transform(y_test)
Loading the data into DataLoaders¶
After you standardize your data, you are usually good to go. Not so fast, this time. After spending quite some time working with PyTorch and going through others’ code on the internet, I noticed most people ended up doing the matrix operations for mini-batch training, i.e., slicing the data into smaller batches, using NumPy. You may think that’s what NumPy is for; I get it. But there is also a more elegant PyTorch way of doing it, which certainly gets much less attention than it should, in my opinion.
PyTorch’s DataLoader class, a Python iterable over Dataset, loads the data and splits them into batches for you to do mini-batch training. The most important argument for the DataLoader constructor is the Dataset, which indicates a dataset object to load data from. There are mainly two types of datasets, one being map-style datasets and the other iterable-style datasets.
In this tutorial, I’ll use the latter, but feel free to check them out in the official documentation. It is also possible to write your own Dataset or DataLoader classes for your requirements, but that’s definitely beyond the scope of this post as the built-in constructors would do more than suffice. But here’s a link to the official tutorial on the topic.
For now, I’ll be using the class called TensorDataset, a dataset class wrapping the tensors. Since Scikit-learn’s scalers output NumPy arrays, I need to convert them into Torch tensors to load them into TensorDatasets. After creating Tensor datasets for each dataset, I’ll use them to create my DataLoaders. You may notice an extra DataLoader with the batch size of 1 and wonder why the hell we need it. I’ll get to that in a bit.
from torch.utils.data import TensorDataset, DataLoader
batch_size = 64
train_features = torch.Tensor(X_train_arr)
train_targets = torch.Tensor(y_train_arr)
val_features = torch.Tensor(X_val_arr)
val_targets = torch.Tensor(y_val_arr)
test_features = torch.Tensor(X_test_arr)
test_targets = torch.Tensor(y_test_arr)
train = TensorDataset(train_features, train_targets)
val = TensorDataset(val_features, val_targets)
test = TensorDataset(test_features, test_targets)
train_loader = DataLoader(train, batch_size=batch_size, shuffle=False, drop_last=True)
val_loader = DataLoader(val, batch_size=batch_size, shuffle=False, drop_last=True)
test_loader = DataLoader(test, batch_size=batch_size, shuffle=False, drop_last=True)
test_loader_one = DataLoader(test, batch_size=1, shuffle=False, drop_last=True)
Defining the RNN model classes¶
I don’t think I can ever do justice to RNNs if I try to explain the nitty-gritty of how they work in just a few sentences here. Fortunately, there are several well-written articles on these networks for those who are looking for a place to start, Andrej Karpathy’s The Unreasonable Effectiveness of Recurrent Neural Networks, Chris Olah’s Understanding LSTM networks, and Michael Phi’s Illustrated Guide to LSTM’s and GRU’s: A step by step explanation are a few that come to mind.
However, traditional neural networks can’t do this, and they start from scratch every time they are given a task, pretty much like Leonard, you see. RNN addresses this shortcoming. To make a gross oversimplification, they do so by looping the information from one step of the network to the next, allowing information to persist within the network. This makes them a pretty strong candidate to solve various problems involving sequential data, such as speech recognition, language translation, or time-series forecasting, as we will see in a bit.
Vanilla RNN¶
By extending PyTorch’s nn.Module, a base class for all neural network modules, we define our RNN module as follows. Our RNN module will have one or more RNN layers connected by a fully connected layer to convert the RNN output into desired output shape. We also need to define the forward propagation function as a class method, called forward(). This method is executed sequentially, passing the inputs and the zero-initialized hidden state. Nonetheless, PyTorch automatically creates and computes the backpropagation function backward().
class RNNModel(nn.Module):
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim, dropout_prob):
"""The __init__ method that initiates an RNN instance.
Args:
input_dim (int): The number of nodes in the input layer
hidden_dim (int): The number of nodes in each layer
layer_dim (int): The number of layers in the network
output_dim (int): The number of nodes in the output layer
dropout_prob (float): The probability of nodes being dropped out
"""
super(RNNModel, self).__init__()
# Defining the number of layers and the nodes in each layer
self.hidden_dim = hidden_dim
self.layer_dim = layer_dim
# RNN layers
self.rnn = nn.RNN(
input_dim, hidden_dim, layer_dim, batch_first=True, dropout=dropout_prob
)
# Fully connected layer
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
"""The forward method takes input tensor x and does forward propagation
Args:
x (torch.Tensor): The input tensor of the shape (batch size, sequence length, input_dim)
Returns:
torch.Tensor: The output tensor of the shape (batch size, output_dim)
"""
# Initializing hidden state for first input with zeros
h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_().cuda()
# Forward propagation by passing in the input and hidden state into the model
out, h0 = self.rnn(x, h0.detach())
# Reshaping the outputs in the shape of (batch_size, seq_length, hidden_size)
# so that it can fit into the fully connected layer
out = out[:, -1, :]
# Convert the final state to our desired output shape (batch_size, output_dim)
out = self.fc(out)
return out
Vanilla RNN has one shortcoming, though. Simple RNNs can connect previous information to the current one, where the temporal gap between the relevant past information and the current one is small. As that gap grows, RNNs become less capable of learning the long-term dependencies, also known as the vanishing gradient problem. This is where LSTM comes for help.
Long Short-Term Memory (LSTM)¶
Long Short-Term Memory, LSTM for short, is a special type of recurrent network capable of learning long-term dependencies and tends to work much better than the standard version on a wide variety of tasks. RNNs on steroids, so to speak.
The standard version’s main difference is that, in addition to the hidden state, LSTMs have the cell state, which works like a conveyor belt that carries the relevant information from the earlier steps to later steps. Along the way, the new information is added to or removed from the cell state via input and forget gates, two neural networks that determine which information is relevant. From the implementation standpoint, you don’t really have to bother with such details. All you need to add is a cell state in your forward() method.
class LSTMModel(nn.Module):
"""LSTMModel class extends nn.Module class and works as a constructor for LSTMs.
LSTMModel class initiates a LSTM module based on PyTorch's nn.Module class.
It has only two methods, namely init() and forward(). While the init()
method initiates the model with the given input parameters, the forward()
method defines how the forward propagation needs to be calculated.
Since PyTorch automatically defines back propagation, there is no need
to define back propagation method.
Attributes:
hidden_dim (int): The number of nodes in each layer
layer_dim (str): The number of layers in the network
lstm (nn.LSTM): The LSTM model constructed with the input parameters.
fc (nn.Linear): The fully connected layer to convert the final state
of LSTMs to our desired output shape.
"""
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim, dropout_prob):
"""The __init__ method that initiates a LSTM instance.
Args:
input_dim (int): The number of nodes in the input layer
hidden_dim (int): The number of nodes in each layer
layer_dim (int): The number of layers in the network
output_dim (int): The number of nodes in the output layer
dropout_prob (float): The probability of nodes being dropped out
"""
super(LSTMModel, self).__init__()
# Defining the number of layers and the nodes in each layer
self.hidden_dim = hidden_dim
self.layer_dim = layer_dim
# LSTM layers
self.lstm = nn.LSTM(
input_dim, hidden_dim, layer_dim, batch_first=True, dropout=dropout_prob
)
# Fully connected layer
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
"""The forward method takes input tensor x and does forward propagation
Args:
x (torch.Tensor): The input tensor of the shape (batch size, sequence length, input_dim)
Returns:
torch.Tensor: The output tensor of the shape (batch size, output_dim)
"""
# Initializing hidden state for first input with zeros
h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_().to(device)
# Initializing cell state for first input with zeros
c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_().to(device)
# We need to detach as we are doing truncated backpropagation through time (BPTT)
# If we don't, we'll backprop all the way to the start even after going through another batch
# Forward propagation by passing in the input, hidden state, and cell state into the model
out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
# Reshaping the outputs in the shape of (batch_size, seq_length, hidden_size)
# so that it can fit into the fully connected layer
out = out[:, -1, :]
# Convert the final state to our desired output shape (batch_size, output_dim)
out = self.fc(out)
return out
Gated Recurrent Unit (GRU)¶
Gated Recurrent Units (GRU) is a slightly more streamlined variant that provides comparable performance and considerably faster computation. Like LSTMs, they also capture long-term dependencies, but they do so using reset and update gates without any cell state.
While the update gate determines how much of the past information needs to be kept, the reset gate decides how much of the past information to forget. Doing fewer tensor operations, GRUs are often faster and require less memory than LSTMs. As you see below, its model class is almost identical to the RNN’s.
class GRUModel(nn.Module):
"""GRUModel class extends nn.Module class and works as a constructor for GRUs.
GRUModel class initiates a GRU module based on PyTorch's nn.Module class.
It has only two methods, namely init() and forward(). While the init()
method initiates the model with the given input parameters, the forward()
method defines how the forward propagation needs to be calculated.
Since PyTorch automatically defines back propagation, there is no need
to define back propagation method.
Attributes:
hidden_dim (int): The number of nodes in each layer
layer_dim (str): The number of layers in the network
gru (nn.GRU): The GRU model constructed with the input parameters.
fc (nn.Linear): The fully connected layer to convert the final state
of GRUs to our desired output shape.
"""
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim, dropout_prob):
"""The __init__ method that initiates a GRU instance.
Args:
input_dim (int): The number of nodes in the input layer
hidden_dim (int): The number of nodes in each layer
layer_dim (int): The number of layers in the network
output_dim (int): The number of nodes in the output layer
dropout_prob (float): The probability of nodes being dropped out
"""
super(GRUModel, self).__init__()
# Defining the number of layers and the nodes in each layer
self.layer_dim = layer_dim
self.hidden_dim = hidden_dim
# GRU layers
self.gru = nn.GRU(
input_dim, hidden_dim, layer_dim, batch_first=True, dropout=dropout_prob
)
# Fully connected layer
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
"""The forward method takes input tensor x and does forward propagation
Args:
x (torch.Tensor): The input tensor of the shape (batch size, sequence length, input_dim)
Returns:
torch.Tensor: The output tensor of the shape (batch size, output_dim)
"""
# Initializing hidden state for first input with zeros
h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_().to(device)
# Forward propagation by passing in the input and hidden state into the model
out, _ = self.gru(x, h0.detach())
# Reshaping the outputs in the shape of (batch_size, seq_length, hidden_size)
# so that it can fit into the fully connected layer
out = out[:, -1, :]
# Convert the final state to our desired output shape (batch_size, output_dim)
out = self.fc(out)
return out
Similar to the trick we do with scalers, we can also easily switch between these models we just created.
def get_model(model, model_params):
models = {
"rnn": RNNModel,
"lstm": LSTMModel,
"gru": GRUModel,
}
return models.get(model.lower())(**model_params)
Now, it seems like we got everything ready to train our RNN models. But where do we start?
Making predictions¶
Let’s start by creating the main framework for training the models. There are probably heaps of ways to do this, and one of them is to use a helper, or a wrapper, class that holds the training, validation, and evaluation methods. First, we need to have a model class, a loss function to calculate the losses, and an optimizer to update the weights in the network.
Helper/Wrapper Class for training¶
If you’re familiar with neural networks, you already know that training them is a rather repetitive process, looping back and forth between forward-prop and back-prop. I find it useful to have one level of abstraction, a train step function or wrapper, to combine these repetitive steps.
After defining one proper training step, we can now move onto writing the training loop where this step function will be called at each epoch. During each epoch in training, there are two stages: training and validation. After each training step, the network’s weights are tweaked a bit to minimize the loss function. Then, the validation step will evaluate the current state of the model to see if there has been any improvement after the most recent update.
As I’ll be using mini-batch training, a training technique where only a portion of data is used at each epoch, there will be two for loops for each stage where a model is trained and validated batch by batch. This usually requires reshaping each batch tensor into the correct input dimensions so that the network can use it as an input.
Another important thing to note is to activate the train() mode during training and the eval() mode during the validation. While the train() mode allows the network’s weights to be updated, the evaluation () mode signals the model that there is no need to calculate the gradients. Hence, the weights stay the same.
Now, we can finally train our model. However, without evaluating these models with a separate test set, i.e., a hold-out set, it would be impossible to tell how the model performs than other models we’re building. Much similar to the validation loop in the train() method, we’ll define a testing method to evaluate our models as follows.
During the training, the loss function outputs are generally a good indicator of whether the model is learning, overfitting, or underfitting. For this reason, we’ll be plotting simple loss figures by using the following method.
class Optimization:
"""Optimization is a helper class that allows training, validation, prediction.
Optimization is a helper class that takes model, loss function, optimizer function
learning scheduler (optional), early stopping (optional) as inputs. In return, it
provides a framework to train and validate the models, and to predict future values
based on the models.
Attributes:
model (RNNModel, LSTMModel, GRUModel): Model class created for the type of RNN
loss_fn (torch.nn.modules.Loss): Loss function to calculate the losses
optimizer (torch.optim.Optimizer): Optimizer function to optimize the loss function
train_losses (list[float]): The loss values from the training
val_losses (list[float]): The loss values from the validation
last_epoch (int): The number of epochs that the models is trained
"""
def __init__(self, model, loss_fn, optimizer):
"""
Args:
model (RNNModel, LSTMModel, GRUModel): Model class created for the type of RNN
loss_fn (torch.nn.modules.Loss): Loss function to calculate the losses
optimizer (torch.optim.Optimizer): Optimizer function to optimize the loss function
"""
self.model = model
self.loss_fn = loss_fn
self.optimizer = optimizer
self.train_losses = []
self.val_losses = []
def train_step(self, x, y):
"""The method train_step completes one step of training.
Given the features (x) and the target values (y) tensors, the method completes
one step of the training. First, it activates the train mode to enable back prop.
After generating predicted values (yhat) by doing forward propagation, it calculates
the losses by using the loss function. Then, it computes the gradients by doing
back propagation and updates the weights by calling step() function.
Args:
x (torch.Tensor): Tensor for features to train one step
y (torch.Tensor): Tensor for target values to calculate losses
"""
# Sets model to train mode
self.model.train()
# Makes predictions
yhat = self.model(x)
# Computes loss
loss = self.loss_fn(y, yhat)
# Computes gradients
loss.backward()
# Updates parameters and zeroes gradients
self.optimizer.step()
self.optimizer.zero_grad()
# Returns the loss
return loss.item()
def train(self, train_loader, val_loader, batch_size=64, n_epochs=50, n_features=1):
"""The method train performs the model training
The method takes DataLoaders for training and validation datasets, batch size for
mini-batch training, number of epochs to train, and number of features as inputs.
Then, it carries out the training by iteratively calling the method train_step for
n_epochs times. If early stopping is enabled, then it checks the stopping condition
to decide whether the training needs to halt before n_epochs steps. Finally, it saves
the model in a designated file path.
Args:
train_loader (torch.utils.data.DataLoader): DataLoader that stores training data
val_loader (torch.utils.data.DataLoader): DataLoader that stores validation data
batch_size (int): Batch size for mini-batch training
n_epochs (int): Number of epochs, i.e., train steps, to train
n_features (int): Number of feature columns
"""
model_path = f'{self.model}_{datetime.now().strftime("%Y-%m-%d %H:%M:%S")}'
for epoch in range(1, n_epochs + 1):
batch_losses = []
for x_batch, y_batch in train_loader:
x_batch = x_batch.view([batch_size, -1, n_features]).to(device)
y_batch = y_batch.to(device)
loss = self.train_step(x_batch, y_batch)
batch_losses.append(loss)
training_loss = np.mean(batch_losses)
self.train_losses.append(training_loss)
with torch.no_grad():
batch_val_losses = []
for x_val, y_val in val_loader:
x_val = x_val.view([batch_size, -1, n_features]).to(device)
y_val = y_val.to(device)
self.model.eval()
yhat = self.model(x_val)
val_loss = self.loss_fn(y_val, yhat).item()
batch_val_losses.append(val_loss)
validation_loss = np.mean(batch_val_losses)
self.val_losses.append(validation_loss)
if (epoch <= 10) | (epoch % 50 == 0):
print(
f"[{epoch}/{n_epochs}] Training loss: {training_loss:.4f}\t Validation loss: {validation_loss:.4f}"
)
torch.save(self.model.state_dict(), model_path)
def evaluate(self, test_loader, batch_size=1, n_features=1):
"""The method evaluate performs the model evaluation
The method takes DataLoaders for the test dataset, batch size for mini-batch testing,
and number of features as inputs. Similar to the model validation, it iteratively
predicts the target values and calculates losses. Then, it returns two lists that
hold the predictions and the actual values.
Note:
This method assumes that the prediction from the previous step is available at
the time of the prediction, and only does one-step prediction into the future.
Args:
test_loader (torch.utils.data.DataLoader): DataLoader that stores test data
batch_size (int): Batch size for mini-batch training
n_features (int): Number of feature columns
Returns:
list[float]: The values predicted by the model
list[float]: The actual values in the test set.
"""
with torch.no_grad():
predictions = []
values = []
for x_test, y_test in test_loader:
x_test = x_test.view([batch_size, -1, n_features]).to(device)
y_test = y_test.to(device)
self.model.eval()
yhat = self.model(x_test)
predictions.append(yhat.cpu().detach().numpy())
values.append(y_test.cpu().detach().numpy())
return predictions, values
def plot_losses(self):
"""The method plots the calculated loss values for training and validation
"""
plt.plot(self.train_losses, label="Training loss")
plt.plot(self.val_losses, label="Validation loss")
plt.legend()
plt.title("Losses")
plt.show()
plt.close()
Training the model¶
So far, we have prepared our dataset, defined our model classes and the wrapper class. We need to put all of them together. Without further ado, let’s start training our model.
import torch.optim as optim
input_dim = len(X_train.columns)
output_dim = 1
hidden_dim = 64
layer_dim = 3
batch_size = 64
dropout = 0.2
n_epochs = 50
learning_rate = 1e-3
weight_decay = 1e-6
model_params = {'input_dim': input_dim,
'hidden_dim' : hidden_dim,
'layer_dim' : layer_dim,
'output_dim' : output_dim,
'dropout_prob' : dropout}
model = get_model('lstm', model_params).to(device)
loss_fn = nn.MSELoss(reduction="mean")
optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
opt = Optimization(model=model, loss_fn=loss_fn, optimizer=optimizer)
opt.train(train_loader, val_loader, batch_size=batch_size, n_epochs=n_epochs, n_features=input_dim)
opt.plot_losses()
predictions, values = opt.evaluate(
test_loader_one,
batch_size=1,
n_features=input_dim
)
[1/50] Training loss: 0.0173 Validation loss: 0.0190
[2/50] Training loss: 0.0122 Validation loss: 0.0139
[3/50] Training loss: 0.0106 Validation loss: 0.0136
[4/50] Training loss: 0.0096 Validation loss: 0.0184
[5/50] Training loss: 0.0092 Validation loss: 0.0161
[6/50] Training loss: 0.0089 Validation loss: 0.0097
[7/50] Training loss: 0.0087 Validation loss: 0.0099
[8/50] Training loss: 0.0085 Validation loss: 0.0102
[9/50] Training loss: 0.0084 Validation loss: 0.0101
[10/50] Training loss: 0.0082 Validation loss: 0.0096
[50/50] Training loss: 0.0075 Validation loss: 0.0077
Formatting the predictions¶
As you may recall, we trained our network with standardized inputs; therefore, all the model’s predictions are also scaled. Also, after using batching in our evaluation method, all of our predictions are now in batches. To calculate error metrics and plot these predictions, we need first to reduce these multi-dimensional tensors to a one-dimensional vector, i.e., flatten, and then apply inverse_transform() to get the predictions’ real values.
def inverse_transform(scaler, df, columns):
for col in columns:
df[col] = scaler.inverse_transform(df[col])
return df
def format_predictions(predictions, values, df_test, scaler):
vals = np.concatenate(values, axis=0).ravel()
preds = np.concatenate(predictions, axis=0).ravel()
df_result = pd.DataFrame(data={"value": vals, "prediction": preds}, index=df_test.head(len(vals)).index)
df_result = df_result.sort_index()
df_result = inverse_transform(scaler, df_result, [["value", "prediction"]])
return df_result
df_result = format_predictions(predictions, values, X_test, scaler)
df_result
| value | prediction | |
|---|---|---|
| Datetime | ||
| 2015-04-09 15:00:00 | 32204.0 | 30085.351562 |
| 2015-04-09 16:00:00 | 32049.0 | 30082.298828 |
| 2015-04-09 17:00:00 | 32209.0 | 30092.333984 |
| 2015-04-09 18:00:00 | 32707.0 | 30150.080078 |
| 2015-04-09 19:00:00 | 33012.0 | 30265.785156 |
| ... | ... | ... |
| 2018-08-02 20:00:00 | 44057.0 | 45049.468750 |
| 2018-08-02 21:00:00 | 43256.0 | 43704.136719 |
| 2018-08-02 22:00:00 | 41552.0 | 41946.421875 |
| 2018-08-02 23:00:00 | 38500.0 | 39456.320312 |
| 2018-08-03 00:00:00 | 35486.0 | 35729.273438 |
29074 rows × 2 columns
Calculating error metrics¶
After flattening and de-scaling the values, we can now calculate error metrics, such as mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE).
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
def calculate_metrics(df):
result_metrics = {'mae' : mean_absolute_error(df.value, df.prediction),
'rmse' : mean_squared_error(df.value, df.prediction) ** 0.5,
'r2' : r2_score(df.value, df.prediction)}
print("Mean Absolute Error: ", result_metrics["mae"])
print("Root Mean Squared Error: ", result_metrics["rmse"])
print("R^2 Score: ", result_metrics["r2"])
return result_metrics
result_metrics = calculate_metrics(df_result)
Mean Absolute Error: 2955.338
Root Mean Squared Error: 3880.14059023639
R^2 Score: 0.6424345543424599
Generating baseline predictions¶
Having some sort of baseline model helps us compare how our models actually do at prediction. For this task, I’ve chosen good old linear regression, good enough to generate reasonable baseline but simple enough to do it very fast.
from sklearn.linear_model import LinearRegression
def build_baseline_model(df, test_ratio, target_col):
X, y = feature_label_split(df, target_col)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=test_ratio, shuffle=False
)
model = LinearRegression()
model.fit(X_train, y_train)
prediction = model.predict(X_test)
result = pd.DataFrame(y_test)
result["prediction"] = prediction
result = result.sort_index()
return result
df_baseline = build_baseline_model(df_features, 0.2, 'value')
baseline_metrics = calculate_metrics(df_baseline)
Mean Absolute Error: 3652.5844053105866
Root Mean Squared Error: 4589.279608903664
R^2 Score: 0.4997931432605698
Visualizing the predictions¶
Last but not least, visualizing your results helps you better understand how your model performs and adds what kind of features would likely improve it. I’ll be using Plotly again, but feel free to use a package that you are more comfortable with.
import plotly.offline as pyo
import plotly.graph_objs as go
from plotly.offline import iplot
def plot_predictions(df_result, df_baseline):
data = []
value = go.Scatter(
x=df_result.index,
y=df_result.value,
mode="lines",
name="values",
marker=dict(),
text=df_result.index,
line=dict(color="rgba(0,0,0, 0.3)"),
)
data.append(value)
baseline = go.Scatter(
x=df_baseline.index,
y=df_baseline.prediction,
mode="lines",
line={"dash": "dot"},
name='linear regression',
marker=dict(),
text=df_baseline.index,
opacity=0.8,
)
data.append(baseline)
prediction = go.Scatter(
x=df_result.index,
y=df_result.prediction,
mode="lines",
line={"dash": "dot"},
name='predictions',
marker=dict(),
text=df_result.index,
opacity=0.8,
)
data.append(prediction)
layout = dict(
title="Predictions vs Actual Values for the dataset",
xaxis=dict(title="Time", ticklen=5, zeroline=False),
yaxis=dict(title="Value", ticklen=5, zeroline=False),
)
# fig = dict(data=data, layout=layout)
# iplot(fig)
fig = go.Figure(data=data, layout=layout)
fig.show(renderer="colab")
# Set notebook mode to work in offline
pyo.init_notebook_mode()
plot_predictions(df_result, df_baseline)
Where to next?¶
I’d like to say that was all, but there’s and will be certainly more. Deep learning has been one of, if not, the most fruitful research areas in machine learning. The research on the sequential deep learning models is growing and will likely keep growing in the future. You may consider this post as the first step into exploring what these techniques have to offer for time series forecasting.
There are still a few more topics that I’d like to write about, like forecasting into the future time steps using time-lagged and datetime features, regularization techniques, some of which we have already used in this post, and more advanced deep learning architectures for time series. And, the list goes on. Let’s hope that my motivation to keep it going lives up to such ambitions. But, for now, I’d say that’s a wrap.